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1. INTRODUCTION 


The principal function of vision is to measure the environment. As demonstrated by the 
coordination of motor actions with the positions and trajectories of moving objects in cluttered 
environments and by the rapid recognition of solid objects in varying contexts fr °m ch^iging per- 
spectives, vision provides real-time information about the geometrical structure and location of 
environmental objects and events. 

Information about the geometrical structure of scenes, objects, ^dmotions ^ vl ^ ually 
acquired not only by the exploration of natural environments, but also from artificial, hu 
designed displays. Photographs, drawings, movies, computer graphics, and other such artificial 
2-D displays are widely and effectively used tools for communicating information about spatial 
LcS SUng the basis for the effectiveness of such tools poses a special tiieoretical 
challenge, because the trigonometric mapping from the 3-D structures and motions portrayed 
these displays to the optical patterns on the observer's retinae differs from the perspective projec- 
tions that* normally hold for vision in natural environments. Cutting (1987) has ^^ y ^“ SSed 
the theoretical difficulties posed by this discrepancy between the projective 

versus that of natural vision, and he has also provided experimental demonstrations of the abilities 
of humans to perceive 3-D structure in movies viewed "from the front row side aisle. 

The purpose of this paper is to examine the geometric infonnation provided by 2-D spatial 
displays. We propose that the geometry of this information is best understood not witiun th 
traditional framework of perspective trigonometry, but in terms of the structure of qu^itanve re 
tions defined by congruences among intrinsic geometric relauons in images of surfaces. The 
mathematical details of this theory of the geometry of vision are presented elsewhere (Lappm, in 
press); the present paper outlines the basic concepts of this geometrical theory. 


l Work on this naoer and on related experimental and theoretical research was supported m part by a Small 
Business Innovative Research Grant from NASA to T. D. Wason, by NIH Grant EY-05926 to J. S. 
the University Research in Residence Program of the Air Force Office of Scientific Research which enabled several 
extended visits by Lappin to Wright-PatteKm Air Force Base. The mathematical ideas oudrned m this Mohave 
S&TtaitanuT tan discussions with Jan Koenderink and Andrea van Doom S»c Un.vers.ty of Uneoln, 
The Netherlands, and especially with John G. Rateliffe, Dept of Mathematics at Vanderb 
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Traditionally, the structure of space-both the 3-D space of the environment and the 2-D space 
of the image-has been regarded as defined a priori, independently of the objects and motions con- 
tained within it. Indeed, the geometric structure of objects and motions is typically described by 
reference to extrinsic standards that define parallel and perpendicular directions and quantify rela- 
tive magnitudes of distance extrinsic to the objects themselves. 

When described in terms of this extrinsic framework, however, the geometry of vision is 
quite complicated: Metric 2 relations in the 2-D image plane cannot be isomorphic with metric rela- 
tions in the 3-D environment; the perspective projection from 3-D spatial structures in the environ- 
ment onto the 2-D image plane does not have a well-defined inverse. Therefore, the recovery of 
information about the geometric structures and locations of the environmental objects has often 
been thought to require supplementary information about the perspective position of the observer 
or about the structure and location of the objects. The 2-D optical images alone have seemed 
insufficient. 

But the assumption that vision begins with an abstract structure of space as a prior standard 
for describing environmental objects begs the question. The basic problem of vision is to find a 
measurement structure for representing the spatial characteristics of observed scenes, objects, and 
events. Such a measurement structure is generally not given beforehand, but must be discovered 
in the organization of the empirical observations themselves. 


2. INTRINSIC GEOMETRY OF SURFACES AND IMAGES 


When described in terms of the intrinsic geometry of surfaces, the geometry of vision 
becomes much simpler. In the first place, the mapping of a visible region of an environmental 
surface onto its optical image is a mapping from one 2-D manifold onto another. The derivatives 
and singularities of the surface-its slopes, peaks and valleys, inflections, saddlepoints, and 
occluding edges-are isomorphic with the derivatives and singularities of the image. This is true for 
images described by gradients of texture, motion parallax, or stereoscopic disparity (Koenderink 
and van Doom, 1975, 1976a,b,c, 1977). Although the isomorphism does not hold for images 
described by luminance gradients, partly because of the additional influence of the direction of 
illumination, it is still true that the intrinsic surface structure (in particular, the parabolic lines, 
which are inflections of curvature that separate regions of convexity and concavity) is systemati- 
cally related to the differential structure of the image (Koenderink and van Doom, 1980). Because 
the differential structures of the two manifolds are essentially isomorphic with one another, the 
ordinal topography of the visible region of an environmental surface is fully described and 
recoverable by its optical image. 

Furthermore, the specific mapping between curves and forms on the environmental surface 
and their corresponding images on an observer's retina may be locally described simply by a linear 


2 The term metric is used in a conventional mathematical sense, referring in this context to measures of 
distance over a potentially curved surface. A relation m(a,b) between two elements a and b is said to be a metric 
relation if it satisfies the following axioms for all elements a, b, and c: 
positivity: m(a,b) > 0 
symmetry: m(a,b) = m(b,a) 
reflexivity: m(a,a) = 0 

triangle inequality: m(a,c) < m(a,b) + m(b,c). 


18-2 



coordinate transformation between the derivatives on the two manifolds. This linear ap^^mation 
holds for "infinitely small" surface patches that may be locally approximated by a mngent plane at 
that location. This linear mapping of the surface onto its image also has a well-defined invers . 
Accordingly, the local structure of the surface may be obtained from the local structure of its image 

by a linear coordinate transformation. 

These simple relationships between the surface and its image involve the derivatives on the 
two manifolds. The Unear transformation that best describes the relationship between these two 
manifolds at any given point is given by the partial derivatives of the two c^nate 5 systems. 
Thus, if 02 represents the 2-D manifold of the object surface, and if R representsthe 2 D 
manifold corresponding to the observer's retina, then the linear differential map v: 02 -> R 2 is 
specified by the following Jacobian matrix of partial derivatives: 


drVBo 1 drV3o 2 
dr 2 /do 1 drVdo 1 


Suppose that [dO] = [do 1 , do 2 ] is a 2x1 column vector that specifies an infinitesimal displacement 
on the surface in terms of two intrinsic coordinates on the object surface, and suppose 
TdRl = [dr 1 dr 2 ] is a corresponding description of the image of this vector in terms of the intn 
sic coordinates of the retina. Then the transformation between these two coordinate systen^ pro- 
duced by the optical projection from the object to its image on the retina is given by the linear 

equation 


[dR] = V[dO] 


and the inverse map is given by 


[dO]=V‘ 1 [dR] 

where V is the Jacobian matrix given above. (The form of this equation is independent of the 

specific coordinate systems used to specify positions on the two manifold^ The c^rinates need 
not intersect at right angles nor even be straight lines; they need only be differennable and to pro- 
vide a unique spidficafion of each position on the manifold. The generality of thts representation 
seems especially relevant to vision, where no specific coordinate system can be assumed before- 
hand for any particular environmental surface. 3 ) The important point is that the local structure of 
the retinal image of a given surface is described by this Jacobian matrix of partial derivatives, V. 
The entries in this matrix vary as a function of position on the surface, with variations in the values 
of these entries reflecting variations in the orientation and curvature of the surface. 

The same approach can also be used to describe the relationships with a third 2-D manifold 
associated, for example, with an intervening display image such as a movie m* ^ 

pose that I 2 represents the manifold of such an intervening image, that a. O 2 — » I represents the 


3 For concreteness we may assume that the coordinates reflect the spatial arrangement of the gradients and 
simrulan^s sSe^e ,g„ tending to run parallel and perpendicular to the gradients of curvature of the surface 
and to the boundary contours, comers, and parabolic lines (which separate structurally distinct regions). W ®"* 4 not 
assume that theSdoordinates have specific numerical values, only that they are differenuable and uniquely label 

every location on the surface. 
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differential map between these two manifolds, and that b.' I 2 — > R 2 is the visual map from the dis- 
play image onto the retinal manifold. Then, using the chain rale, the two successive maps can be 
combined by a composition of the two functions, v = (b • a): O 2 — > R 2 . Similarly, the coordinate 
transformation corresponding to this chain would be given by a linear equation of the following 


[dR] = BA[dO], 

where the matrix product BA = V again provides a linear coordinate transformation functionally 
equivalent to the previous construction. 

Representation of the metric structure of the surface requires an embedding of the 2-D mani- 
fold of the surface or its image into the 3-D manifold of Euclidean 3-space, E 3 . Suppose that 
[dX] = [dx 1 , dx 2 , dx 3 ] is a 3x1 column vector giving the three orthogonal cartesian coordinates 
of an infinitesimal displacement on the object surface. Then the perspective coordinate embedding 
of the image of the surface into E 3 , p: R 2 — > E 3 , is given by a linear coordinate transformation of 
the following form: 


[dX] = PV[dO] 

where P is a 3x2 matrix of partial derivatives, P = [3x k /dr‘], with k = 1,2,3 and i = 1,2. 
Measures of metric relations require a quadratic expression similar to the Pythagorean formula for 
distance in E 3 . The metric tensor that provides the measure of distance on the surface is obtained 
by substituting from the above equation for the vector [dX] in the Pythagorean formula: 

ds 2 = [dX] T [dX] 

= [PV[dO]]TpV[dO] 

= [dO] T V T P T PV[dO] 

= [dO] T V T P* V[dO], 

where P* = P T P is a symmetric 2x2 matrix with quadratic entries of the form 

P* = [Z (dx k /3r*)(dx k /dri)] . 

k 

Thus, the entries in this matrix provide a measure of squared distance on the object surface at a 
particular position on the retina corresponding to the image of the surface. The length of any arbi- 
trary curve on the surface is obtained by integrating the quantities ds defined in the preceding 
equation at each position along the curve. 

The three independent parameters of the matrix P* are not given directly by a single station- 
ary image of an isolated local surface patch. In certain special cases these perspective parameters 
and therefore the metric structure of the local surface patch are determined, up to a scalar, simply 
by the motion of the local patch. More generally, however, these perspective parameters must be 
derived from more global constraints on the image structure associated with the observer's position 
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and motion within the 3-D environment. In general, the perspective embedding of the image into 
E 3 is revealed by actual or implied motions of objects within the space. 


3. METRIC STRUCTURE FROM CONGRUENCE 


Although geometric relations are often described in terms of extrinsic coordinate systems in 
which directions and distances are defined a priori, it is important in many applications to derive 
the structure of space from more fundamental qualitative relationships among the f ob J e ^^ 

nation were used to construct spaces m which the lawful relations among oo^ 
laws of nature. 

Analogously the geometry of vision may also rest upon the symmetries of intrinsic qualita- 
tive relations to die *^3.^ CTvironnwmm noT^rniw^tic 

ing objects. 

ri rvmhflrdo 19871 Gibson’s (1950) conception of the visual information provided by such 

^ p^dXyt^S now 

associaied wid, dm motions 

of objects and observers in 3-D space. 

The essential ideas underlying this conception of geometry were described by the mathemati- 
cian Killing (1892) 4 : 

Every object covers a space at every time. The space covered by one object 
cannot simultaneously be covered by another object. 


4 We grateful to Jan Koenderink for bringing this paper to our attention and to Bemd Rossa for translating 
the paper from the original German. 
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Every object can be moved. If an object covers the space of a second object at 
any time, then the first object can cover any space covered by the second object at anv 
(other) time. 

Every space (object) can be partitioned. Each part of a space (object) is again a 
space. If A is a part of B and B is a part of C, then A is a part of C, where A, 

B, and C may be either spaces or objects, [p. 128] 

These three principles, which are the first of eight principles from which Killing derives a general 
theory of geometry, provide qualitative criteria for defining the equality or congruence of spaces 
and objects: Two spaces are congruent if and only if they can be covered by the same object. Two 
objects are congruent if and only if they can cover the same space. Thus, objects and spaces con- 
stitute mutually interdependent relational structures. The metric structure of both may be derived 
from elementary qualitative properties of differentiability and congruence under motion. (By defi- 
nition, "motions" are isometric transformation groups.) (Also see Weyl, 1952, and 
Guggenheimer, 1963, Sect. 11-2.) 

This conception of form and space provides a basis for understanding how visual informa- 
tion about the metric structure and dimensionality of objects and spaces may be gained from 
"motions" or transformations which bring objects at one position in space into congruence with 
those at other positions. The metric equality of neighboring spaces successively occupied by the 
same object and the equality of separate parts of an object which successively occupy the same 
space may be determined from the motions of objects. Accordingly, the dimensionality of visible 
spaces and objects need not be restricted to the two coordinate dimensions of the image. Rather, 
the dimensionality may be associated with the number of parameters needed to bring an object at 
one location in space into congruence with an object at another location. 

In certain special cases the metric structure of a given surface patch may be locally determined 
(up to a scalar) by its moving images, independent of global properties of the retinal image as a 
whole. If the trajectory of the moving patch is also a surface in space-time with constant curvature 
equal to that of the object patch, then of course the metric tensor for this spatio-temporal surface 
remains constant over the surface. Motion of the object patch from one region of the spatio- 
temporal surface to another does not change the mapping of the surface onto the retina, and the 
contravariant tensor coefficients for this projective mapping of the object patch and its trajectory 
onto the retina vary only as one-parameter functions of time. Accordingly, the perspective param- 
eters for embedding the retinal images of this surface into E^ also vary as one-parameter functions 
of time or of retinal position (which are correlated in this case). The simplicity of these relation- 
ships between the differential structure of the object surface, its trajectory in space-time, and the 
retinal images of these surfaces involves sufficiently few unknown perspective parameters that 
these are determined by the invariance of the metric tensor of the surface patch under motion. That 
is, suppose that Vq and Pq are the Jacobian matrices for the visual and perspective coordinate 
transformations, respectively, for an initial retinal image of the surface patch, and suppose that V t 
and Pt are the corresponding matrices for a second retinal image of the same surface patch follow- 
ing a one-parameter motion onto another position along its constant-curvature trajectory. The 
equivalence of the geometric structure of the two retinal images can be expressed by the equation 

vip*v 0 =vjp;v t 
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where Pt = m(P 0 ) is a one-parameter transformation of P 0 . This matrix equation involves four 
independent linear equations in four unknowns-the three independent perspective parameters of 
P Q and the transformation of these by the parameter t. 

Specific examples of this special case include a sphere that rotates around n .axis Warn 
from the direction of gaze) through its center (e.g., Lappin, Doner 

LaDDin and Perfetto, 1984) and planar patterns that rotate within the same plane (Lappin and 
Fuqua, 1983) tilted with respect to the retinal image. In both of these cases the time-varyi g 
positions of the surface patch form a surface of revolution in space-time generated by a one- 
parameter transformation group (the magnitude of the rotation). In generate 
the images of the moving surface patch remains invanant under the motion (i.e., its Lie derivative 
is zCTcolf and only if ih^ vector field of this group of isometries (the KiUmg v ©clc»”> isa one- 
Darameter group that generates a surface of revolution (Guggenheimer, 1963, pp. 272 273). 

Thus because the moving object forms a surface whose images are generated by a one-p^ne^ 
i^stoation, the perspective parameters for embedding this spatio-temporal surface m to E3^are 
determined up to a scalar by the invariant metric structure of die given surface patch. Indeed, the 
experimental results of Lappin, Doner, and Kottas (1980) and Doner, Lappin, and Perfetto 
(1984)-for the perceived shape of a random-dot sphere rotating about a vertical axis through its 
center-and of Lappin Fuqua (1983) for the perceived inter-point distances among three collinear 
points rotating ina plane-demonstrated just this invariance of visually perceived i metric sUucturc 
™der motion even though the optical displays contained unnaturally exaggerated amounts of polar 

projection. 

In general however, the metric structure of moving objects cannot be recovered from only 
local properties of their retinal images. Instead, the perspective parameters of the projection from 
^3 onto the retina must be recovered more global constraints on the images. Perspective projection 
from £3 onto a^lane produces a hyperbolic geometry in the plane, where mutually paraUel lines 
S^erge towid a coLon vanishing point and all sets of parallel lmes converge toward a com- 
mon SL line. The position of this horizon line in the visual field is equ&l to the observer s 
eve-height. Accordingly, all lines parallel to the observer’s motion through the 3-D environment 
converge toward a common vanishing point on the horizon that specifies the observer s momentary 
position and trajectory through the visible environment. The images of such parallel lines in £3 are 
generated by tii retinal image trajectories of features of stationary environmental objects as the 
observer moves through the environment. Thus, the location of this horizon line and of such van- 
ishing points constitute parameters that characterize the given hyperbolic space and its relation to 
E 3 Like Euclidean space, hyperbolic space is also characterized by congruence and isomeffy o 
form under motion. Thus, congruence relations among visible forms must specify this global per- 
spective embedding of the retinal image into E 3 . Although we have not yet completed the mathe- 
matical analysis of this situation, the following illustrations may help to convey the rationale for 
this conception of the geometry of vision. 

4. CONGRUENCES IN IMAGES 


The potential for constructing spaces from congruences among imaged forms has been won- 
derfully illustrated by M. C. Escher. For example, he has often used translational symmmetry of a 
replicated form to define a 2-D plane. Both the metric structure of this space andalso its 3-D ori- 
entation parallel to the image plane are specified by the translational symmetry. The elementary 
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component form is also defined by its recursion in the image rather than by the familiarity of the 
form itself. 


Symmetries in 3-D Euclidean space are exhibited in figure 1, where the congruence of swan- 
like component forms is obtained by translations and rotations in a 3-D space. The 3-D metric 
structure of the space is implied by the congruence of the recurring forms in separate regions of the 
space. The perspective mapping of this space onto the 2-D image plane is also induced by this 
congruence of the component forms. Thus, the perspective trigonometry is derived from the con- 
gruence; the fundamental property is the congruence rather than the trigonometry. 

In the preceding example, congruence is defined among stationary and concurrent forms. 

The "motion" that brings a form in one location into congruence with a form in another location is 
abstract, rather than an actual trajectory in space-time. If one generalizes the concept of an image 
from a stationary 2-D spatial array to a space-time volume in which the spatial structures are 
extended in time, then the same principle of congruence illustrated in Escher's art can be applied to 
the specification of spaces by the motions of single forms. 

The schematic diagram in figure 2 illustrates three conceptually different types of congruence 
in images. Figure 2B is like that in the Escher print, where the image is a stationary 2-D pattern in 
which a single cube-like structure is recursively positioned at a sequence of neighboring spatial 
positions. The 3-dimensionality of the space is induced by the continuous linear change in the 2-D 
lengths of the contours of the cube as a function of its position in the image plane. This linear 
relation between 2-D length and position corresponds to a particular perspective mapping of 3-D 
space onto the image plane. Thus, the continuous linear relation among neighboring regions of the 
image of a single connected surface specifies the perspective mapping of a 3-D space onto the 2-D 
image. 

In figure 2A the same perspective mapping is defined by a temporal sequence of spatial 
images as the cube is translated through space from position Pi to position P n . The linear trans- 
formation that corresponds to the perspective projection of a plane slanted in depth is now specified 
by a function in space-time, though the geometric relation between the image and the depicted 
space obviously is essentially the same as in figure 3B. In both cases, relationships among neigh- 
boring image regions correspond to relationships among neighboring regions of a smooth surface. 
The perspective relation between the image and the 3-D space in which the surfaces, objects, and 
motions reside is specified by the linear relationship between the lengths of the contours and their 
positions in the image. 

Figure 2C illustrates a slightly different case in which the structure of a space is specified by 
congruences among simultaneous motions of separate forms at separate locations in the image, as 
if the forms were connected and moved in 3-D space. This situation might be produced, for 
example, by motions of the observer or image plane (e.g., a movie or video camera) within a 3-D 
environment In this example two cubes, at positions Pi and P n in 3-D space, are simultaneously 
displaced in a sequence of four successive translations. The perspective mapping from the 3-D 
space in which these events occur onto the 2-D image of the events may be specified by the func- 
tional relation between the magnitudes of the velocities and their locations in the image plane. 
Although the forms at positions Pi and P n in this particular illustration are both cubes that are 
potentially congruent under the same transformations that would bring the motions of the two 
cubes into congruence, this spatial congruence is not necessary and provides in this case an addi- 
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tional redundant specification of the perspective transformation of the 3-D space onto the 2-D 
image. 


The geometric relation between the concurrent motions of just two forms as in figure 2C is 
not generally sufficient to specify the perspective transformation that has yielded the observed 
spatio-temporal image. By the fundamental theorem of plane perspectivity (Delone 1963), the 
perspective mapping of four points in general position (where no three points are collinear) in one 
image plane onto a corresponding set of four points in another image plane is necessary and suf - 
cient to ensure that all of the remaining points are in isometric correspondence in the two p . 
Thus for a set of four or more points in a single plane, the concurrent motions of the images ot 
these points in another plane are in principle sufficient to specify the perspective transformation 
between these two planes and to specify the metric structure of the spatial relations within these 

planar images. 

This geometric relationship endows spatial as well as moving images with considerable 
capacity for carrying information about the geometric structure of the environmental surfaces 
depicted in the images: The geometric structure of an infinitesimally small patch on any arbitrarily 
curved but smooth surface may be locally approximated by a tangentplane at that locauon, and the 
perspective mapping of this tangent plane onto an image plane may be described by a e 
coordinate transformation. The parameters of this linear transformation vary with the relative 3-D 
orientation (the direction of tilt and the magnitude of slant) and distance °J^ e ® n ^ ronm ^ S "' 
face in relation to the image plane. The perspective parameters which embed the image of the sur- 
face into E3 and thereby determine the metric structure of the surface are those parameters that will 
yield the self-congruence of the same object at different locations within the depicted scene. 


EXPERIMENTAL EVIDENCE 


In addition to the evidence provided by the illustrations, by everyday visual experience in 
viewing both natural environments and artificial spatial displays, and by the capabilities of moving 
observers to coordinate their actions with the identities, positions, and trajectories of environmental 
objects, the hypothesis that perceived geometric structure denves from the congruences of moving 
and movable objects is also supported by experimental evidence. A vast amount of expenmental 
evidence appears consistent with this hypothesis, but we mention here only a few experiments that 
seem to provide more direct support for this hypothesis. 

One of the relevant investigations is that of Cutting (1987). Judgments of the apparent rigid- 
ity of rotating rectangular solids were evaluated in a variety of experiment^ display conditions, 
including botii rigidly and nonrigidly rotated figures and displays that simulated varying degrees of 
polar versus parallel projection, and varying degrees of slant of the projection screen i relative tc nhe 
direction of the perspective convergence point. He found good discrimination of ngidversus non- 
rigid figures in displays with approximately parallel projection, essentially independent of the 
degree of simulated screen slant (90°, 67°, 45°, or varying between 80 and 55 ), even when the 
simulated slant was varied sinusoidally during a given trial. Although the figure s .appeared to 
move nonrigidly in conditions with polar projection onto screens slanted at 45 , the results gener 
ally demonstrated the robustness of perceived structural rigidity under at least moderate screen 
slants and moderate viewing distances. These results challenge many conventional assumptions 
about the geometrical information for perceiving the spatial structure of form. Cutting concludes 
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that these results probably reflect the insensitivity of vision to the distortions produced by optical 
projections, but this interpretation rests upon assumptions about the definition of visual space by 
the metric structure of 2-D display screens and retinae. An alternative interpretation is that vision is 
very sensitive to spatial relations defined in another way-by the congruence of form under per- 
spective transformations. 

Evidence that vision is indeed very sensitive to the spatial structure of moving forms and that 
this structure is associated with invariant spatial relations in depth rather than the projected 2-D 
positions is provided by experiments reported previously by Lappin and Fuqua (1983). They 
evaluated observers’ acuities in detecting a displacement (a stationary offset in 3-D space) of a 
point from the 3-D center of an imaginary line segment defined by moving patterns of three 
collinear points. The points were rotated in computer-controlled CRT displays as if around an axis 
slanted in depth by amounts varying between trials from 0° (no slant) to 60°. Very small displace- 
ments were accurately detected-displacements greater than 1% of the 3D distance between the two 
outer points could be detected above chance, and displacements of 4% were detected at approxi- 
mately 90% accuracy. The essential 3-dimensionality of the perceived spatial relations was 
demonstrated by the following findings: (1 ) Detection accuracy was independent of either the 
magnitude or variability of the slant of the axis of rotation in depth. (2) Distance-like measures of 
the detection accuracy (similar to the signal detectability measure tf) were linearly related to the 
physical distance of die displacement in 3-D space, with discriminability being proportional to 
physical displacement distances above about 1%. (3) The accuracy for detecting any given dis- 
placement was the same in displays with parallel and with polar perspective, although in the latter 
displays points centered in 3-D depth were not centered in the projected 2-D images. The differ- 
ences in spatial positions between the parallel and polar displays were visually resolvable, how- 
ever. (4) When the task required detection of displacements from the projected 2-D centers of the 
line segments in displays with polar projections, accuracies were not significandy above chance. 
The subjective appearance of the latter displays was that the three points were still seen as rotating 
in depth, but the middle point appeared neither centered nor rigidly attached to the two outer points. 

Thus, these findings suggest that vision may often be unaffected by the 2-D optical "distor- 
tions" in cinema not merely because these spatial differences cannot be resolved by vision, but 
because they do not constitute the geometrical information for perceiving the spatial structure of 
moving patterns. Apparently, perceived spatial structure derives from congruences of form under 
perspective transformations. 

Evidence about the role of such congruences in stereoscopic form perception has been pro- 
vided by recent experiments described by Lappin (in preparation). The purpose of these experi- 
ments was to determine whether the stereoscopic perception of 3-D structure might be shaped by 
the congruences of form associated with motion in depth, rather than by the binocular disparities as 
such. The experiments were motivated by the theoretically challenging fact that for any given 
magnitude of binocular disparity between the horizontal separations of a pair of points in each eye, 
the associated separation in depth increases rapidly and nonlinearly with the viewing distance from 
the observer to the points in question: How then is the stereoscopic perception of form and depth 
calibrated for variations in viewing distance? Does this require "interpretations" of retinal dispari- 
ties based on extra-retinal information about the viewing distance? Alte rn atively might the per- 
ceived geometric structure of surfaces in depth be based on the invariance of the intrinsic geometric 
structure of the surface under the perspective transformations associated with stereoscopic dispar- 
ities and with motions in depth? The theoretical problem is related to those in understanding the 
apparent "paradoxes" of cinema. 
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In one of these experiments, observers were presented with two -very slightly . 
ellipses, in which the vertical axis was either 3% greater or less than 4* . ength ol F **£ >naon^ 
axic These ellipses were displayed as if in a plane slanted in depth by either 50 or W varying 
rajtdonilyfrom one iriaHothe next. Thus, the ptojected fomts were always elliptical, depending 
on L magnitude of the slant as well as on the shape of the ellipse as measured in its own plane in 
depth. Stereoscopic information about the shapes and slants of these patterns was ds °™T pu ‘ 

2d by random variations in the magnitude of the disparities with which the forms were dis- 
played using disparities that were appropriate for either one-half or one-quarter of the actu 
vSg di? Lee at which the patterns were seen. Thus, there were eight alternative stimulus pat- 
terns which randomly varied between trials. 

There were four main experimental conditions-in which the forms were either rotated in 
depth or were stationary, and in which the experimental task was either shape-discrimination 
between the two alternative ellipses or disparity-discrimination between the two alternative dispar- 
IfTmoscopic infoLtion about 3-D structure is sealed by the congruences of movtng 
forms then shape discrimination should be accurate when the forms were rotated m depth, tnde- 
pendently of the distortions and variability produced by the exaggerated binocular dispanties. 

Indeed this is just what happened: Shape discriminations were very accurate when the forms we 
moving and were uncorrelated with the variations in either slant or disparity. Not surpnsmgly, 
shape discriminations were near chance accuracy when the forms were stationaiy b^ause o 
perceptually inseparable conjoint effects of variations in slant and disparity. For the dispanty 
discrimination task, however, motion had the opposite effects: Discriminations tetw<*n*e ™ 
alternative disparity values were more accurate for the stationary than for the moving forms, evi- 
S y ^causeAe congruence of the moving forms tended to obscure differences between the sta- 
tionary disparity spaces. 

Thus these results indicate that the visual scaling of 3-D strocture from stereoscopic disparity 
derives fiom the congruences of the perspecdvely changing forms. Analogoustothe^for M- 
tionaiy pictures and optic flow patterns, binocular disparity per se may have only an indirect rela- 
tion to the perceived depths. 
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Figure 1.- "Swans," etching by M. C. Escher, 1956. © 1988, M. C. Escher heirs/Cordon Art- 
Baam-Holland. 
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Figure 2 - Three types of congruences in images. (A) A cube in position Pi is moved ^ atem- 
8 poral sequent of displacements through 3-D space to position P n . A single object appears 
through space-time. (B) The same cubic form as A ap^ars s™— ly 
in positions Pi and P n , connected in this case by a spatial senes of cu ^^ 3 '^ P ^ e S 
defined by the congruences of the spatial series of repeated component forms. <QTwo 
objects are moved concurrently by a sequence of displacements as lf 
3-D structure of the space is indicated in this case by the congruence of the motions in th 
separaTspS regions rather than by the congruences of the spatial forms as in the other two 

panels. 
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